Disclaimer: This report has been written for the authors learning purposes only and uses open data from Public Health Scotland under the UK Open Government Licence (OGL)
To inform the planning and provision of cancer treatment services by analysing breast cancer incidence data reported by NHS Borders.
Between 1997-2021, breast cancer had the third highest number of incidences of any cancer type reported by NHS Borders. In this period, breast cancer in males made up less than 1% of total breast cancer incidences and this report will therefor focus on incidences among females.
cancer_incidence_borders %>%
filter(cancer_site != "All Cancer Types",
sex == "All") %>%
group_by(cancer_site) %>%
summarise(total_incidences = sum(incidences_all_ages)) %>%
arrange(desc(total_incidences)) %>%
filter(total_incidences > 2000) %>%
gt() %>%
tab_header(title = md("**Total Cancer Incidences by Cancer Site**"),
subtitle = "NHS Borders (1997-2021): Sites w/ Over 2000 Total Incidences") %>%
cols_label(
cancer_site = "Cancer Site",
total_incidences = "Total Incidences") %>%
tab_options(column_labels.font.weight = "bold",
table.align = "left") %>%
gt_highlight_rows(rows = 3)
| Total Cancer Incidences by Cancer Site | |
| NHS Borders (1997-2021): Sites w/ Over 2000 Total Incidences | |
| Cancer Site | Total Incidences |
|---|---|
| Non-Melanoma Skin Cancer | 6174 |
| Basal Cell Carcinoma Of The Skin | 4049 |
| Breast | 2614 |
| Trachea, Bronchus And Lung | 2534 |
| Colorectal Cancer | 2514 |
| Squamous Cell Carcinoma Of The Skin | 2075 |
cancer_incidence_borders %>%
filter(cancer_site == "Breast",
sex != "All") %>%
group_by(sex) %>%
summarise(total_incidences = sum(incidences_all_ages)) %>%
arrange(desc(total_incidences)) %>%
head(3) %>%
gt() %>%
tab_header(title = md("**Breast Cancer Incidences by Sex**"),
subtitle = "NHS Borders (1997-2021)") %>%
cols_label(
sex = "Sex",
total_incidences = "Total Incidences") %>%
tab_options(column_labels.font.weight = 'bold',
table.align = "left")
| Breast Cancer Incidences by Sex | |
| NHS Borders (1997-2021) | |
| Sex | Total Incidences |
|---|---|
| Female | 2598 |
| Male | 16 |
According to NHS Borders data, breast cancer among females
hasthe highest number of incidences and highest mean European
age-standardised rate (EASR) of any cancer type.
cancer_incidence_borders %>%
filter(cancer_site != "All Cancer Types",
sex == "Female") %>%
group_by(cancer_site) %>%
summarise(total_incidences = sum(incidences_all_ages)) %>%
arrange(desc(total_incidences)) %>%
head(3) %>%
gt() %>%
tab_header(title = md("**Female Cancer Incidences**"),
subtitle = "NHS Borders (1997-2021)") %>%
cols_label(
cancer_site = "Cancer Site",
total_incidences = "Total Incidences") %>%
tab_options(column_labels.font.weight = "bold") %>%
gt_highlight_rows(rows = 1)
| Female Cancer Incidences | |
| NHS Borders (1997-2021) | |
| Cancer Site | Total Incidences |
|---|---|
| Breast | 2598 |
| Non-Melanoma Skin Cancer | 2519 |
| Basal Cell Carcinoma Of The Skin | 1882 |
cancer_incidence_borders %>%
filter(cancer_site != "All Cancer Types",
sex == "Female") %>%
group_by(cancer_site) %>%
summarise(mean_easr = mean(easr)) %>%
arrange(desc(mean_easr)) %>%
head(3) %>%
gt() %>%
tab_header(title = md("**Female EASR by Cancer Type**"),
subtitle = "NHS Borders (1997-2021)") %>%
cols_label(
cancer_site = "Cancer Site",
mean_easr = "Mean EASR") %>%
tab_options(column_labels.font.weight = "bold") %>%
gt_highlight_rows(rows = 1)
| Female EASR by Cancer Type | |
| NHS Borders (1997-2021) | |
| Cancer Site | Mean EASR |
|---|---|
| Breast | 161.3640 |
| Non-Melanoma Skin Cancer | 150.3996 |
| Basal Cell Carcinoma Of The Skin | 113.9178 |
To understand how these rates compare to other health boards in Scotland, we can visualise the EASR over a five year period. The EASR is the European age-standardised incidence rate per 100,000 person-years at risk.
geo_summary %>%
ggplot(aes(fill = easr)) +
geom_sf(colour = "white", linewidth = 0.04) +
labs(
title = "Female Breast Cancer EASR (2017-2021)",
subtitle = "By NHS Health Board",
fill = "EASR") +
scale_fill_distiller(palette = "Blues", direction = +1) +
theme(plot.title = element_text(size = 15, face = "bold"),
plot.subtitle = element_text(size = 10),
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white"),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
rect = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank())
NB - Unfortunately data for the individual health boards NHS Western Isles, NHS Shetland and NHS Orkney was not available at the time of report completion.
five_year_summary %>%
select(hb, cancer_site, sex, year, easr) %>%
filter(sex == "Female",
cancer_site == "Breast",
hb != "GR0800001") %>%
left_join(geography_codes, "hb") %>%
select(hb_name, easr) %>%
arrange(desc(easr)) %>%
gt() %>%
tab_header(title = md("**Female Breast Cancer EASR (2017-2021)**")) %>%
cols_label(
hb_name = "Health Board",
easr = "EASR") %>%
tab_options(column_labels.font.weight = 'bold') %>%
data_color(columns = easr, palette = "Blues")
| Female Breast Cancer EASR (2017-2021) | |
| Health Board | EASR |
|---|---|
| NHS Dumfries and Galloway | 174.6153 |
| NHS Lothian | 172.3179 |
| NHS Forth Valley | 171.6585 |
| NHS Lanarkshire | 169.1486 |
| NHS Greater Glasgow and Clyde | 168.8007 |
| NHS Borders | 164.8136 |
| NHS Fife | 164.4207 |
| NHS Tayside | 163.4222 |
| NHS Highland | 162.5039 |
| NHS Ayrshire and Arran | 157.0019 |
| NHS Grampian | 156.2987 |
fig1_plot <- cancer_incidence_borders %>%
filter(sex == "Female",
cancer_site %in% c("All Cancer Types", "Breast")) %>%
ggplot() +
geom_line(aes(x = year, y = incidences_all_ages, colour = cancer_site, group = 1,
text = paste0("<b>Year:</b> ", year, "<br>",
"<b>Type:</b> ", cancer_site, "<br>",
"<b>Incidences:</b> ", incidences_all_ages)),
size = 1.5) +
scale_x_continuous(breaks = c(1997:2021)) +
scale_colour_manual(values = colour_scheme, labels = c("All Combined", "Breast")) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) +
ylim(0, 500) +
labs(
x = "\n Year",
y = "Incidences\n",
title = "Female Cancer Incidences",
colour = "Cancer Type:") +
theme(panel.background = element_rect(fill = "white"),
panel.grid = element_line(colour = "grey90"))
ggplotly(fig1_plot, tooltip = "text") %>%
layout(hovermode = "x unified",
title = list(text = paste0("<b>Female Cancer Incidences</b>",
"<br>",
"<sup>",
"NHS Borders: 1997-2021",
"</sup>")))
What does this visualisation tell us?
When we look at the year-on-year percentage changes in breast cancer incidences we can gain further insights. The table below shows:
Why might there be a 3 year trend?
Women who meet screening criteria are invited for breast screening once every 3 years (NHS National Services Scotland, 2022).
Why might we not see the same peak in 2020 as we may have expected?
Due to the COVID-19 pandemic, no invites to breast screenings were sent between 30 March 2020 and 3 August 2020 (Public Health Scotland, 2022).
cancer_incidence_borders %>%
filter(sex == "Female",
cancer_site == "Breast") %>%
select(year, sex, cancer_site, incidences_all_ages) %>%
mutate(yearly_pct_change = round((incidences_all_ages - lag(incidences_all_ages)) / lag(incidences_all_ages) * 100)) %>%
gt() %>%
cols_label(
year = "Year",
sex = "Sex",
cancer_site = "Cancer Site",
incidences_all_ages = "No. of Incidences",
yearly_pct_change = "% Change from Previous Year") %>%
tab_options(column_labels.font.weight = 'bold') %>%
gt_highlight_rows(rows = c(3, 6, 9, 12, 15, 18, 21, 24),
bold_target_only = TRUE,
target_col = yearly_pct_change)
| Year | Sex | Cancer Site | No. of Incidences | % Change from Previous Year |
|---|---|---|---|---|
| 1997 | Female | Breast | 71 | NA |
| 1998 | Female | Breast | 69 | -3 |
| 1999 | Female | Breast | 133 | 93 |
| 2000 | Female | Breast | 69 | -48 |
| 2001 | Female | Breast | 81 | 17 |
| 2002 | Female | Breast | 131 | 62 |
| 2003 | Female | Breast | 67 | -49 |
| 2004 | Female | Breast | 62 | -7 |
| 2005 | Female | Breast | 179 | 189 |
| 2006 | Female | Breast | 68 | -62 |
| 2007 | Female | Breast | 55 | -19 |
| 2008 | Female | Breast | 154 | 180 |
| 2009 | Female | Breast | 94 | -39 |
| 2010 | Female | Breast | 86 | -9 |
| 2011 | Female | Breast | 157 | 83 |
| 2012 | Female | Breast | 103 | -34 |
| 2013 | Female | Breast | 114 | 11 |
| 2014 | Female | Breast | 130 | 14 |
| 2015 | Female | Breast | 90 | -31 |
| 2016 | Female | Breast | 98 | 9 |
| 2017 | Female | Breast | 136 | 39 |
| 2018 | Female | Breast | 97 | -29 |
| 2019 | Female | Breast | 98 | 1 |
| 2020 | Female | Breast | 107 | 9 |
| 2021 | Female | Breast | 149 | 39 |
Question: Is the greater number of female breast cancer incidences in “peak years” (1999, 2002, 2005, 2008, 2011, 2014, 2017) compared to “non-peak years” (1997, 1998, 2000, 2001, 2003, 2006, 2007, 2009, 2010, 2012, 2013, 2015, 2016, 2018, 2019) statistically significant?
cancer_incidence_borders_sample <- cancer_incidence_borders %>%
filter(sex == "Female", cancer_site == "Breast") %>%
select(id, cancer_site, sex, year, incidences_all_ages) %>%
mutate(peak = case_when(
year == 1999 ~ "peak",
year == 2002 ~ "peak",
year == 2005 ~ "peak",
year == 2008 ~ "peak",
year == 2011 ~ "peak",
year == 2014 ~ "peak",
year == 2017 ~ "peak",
TRUE ~ "standard"
)
)
observed_stat <- cancer_incidence_borders_sample %>%
specify(incidences_all_ages ~ peak) %>%
calculate(stat = "diff in means", order = c("peak", "standard"))
null_distribution <- cancer_incidence_borders_sample %>%
specify(response = incidences_all_ages, explanatory = peak) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in means", order = c("peak", "standard"))
p_value <- null_distribution %>%
get_p_value(obs_stat = observed_stat, direction = "right")
Test Used: Two Sample Mean Test (Independent)
Significance Level: 0.05
H0: \(\mu{1}\) -
\(\mu{2}\) = 0
H1:
\(\mu{1}\) - \(\mu{2}\) > 0
Result: Based on a bootstrapped NULL distribution, a very low p-value which is less than 0.05 is returned. We therefor reject H0 in favor of H1 with evidence suggesting that there is a statistically significant increase in the mean number of female breast cancer incidences in “peak years”.
fig2_plot <- five_year_summary_long %>%
filter(cancer_site == "Breast",
sex == "Female") %>%
ggplot() +
geom_col(aes(x = age, y = incidences,
text = paste0("<b>Age:</b> ", age, "<br>", "<b>Incidences:</b> ", incidences, "<br>")),
fill = "#0391BF") +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) +
labs(
x = "\n Age",
y = "Incidences\n",
title = "Total Female Breast Cancer Incidences by Age") +
theme(panel.background = element_rect(fill = "white"),
panel.grid = element_line(colour = "grey90"))
ggplotly(fig2_plot, tooltip = "text") %>%
layout(title = list(text = paste0("<b>Total Female Breast Cancer Incidences by Age</b>",
"<br>",
"<sup>",
"NHS Borders: 1997-2021",
"</sup>")))
What does this visualisation tell us?
Why might these age groups see increased incidence numbers?
NHS Borders Population Projections:
Females 50+ 2021: 29889
Females 50+ 2041: 31148 (4.21225% increase)
(National Records of Scotland, 2023)
Screening data should be reviewed to establish if the resulting back-log from COVID-19 has been cleared in order to establish whether a further increase in incidences should be anticipated in 2022.
Resources should be allocated according to the observed trend of increased incidences every three years
Research/Analysis should be conducted to further understand and confirm any reason for this trend, including any links to screening schedules.
Research/Analysis should be conducted to establish whether increased incidence with age is in any way the result of current screening criteria and if therefor screening criteria should be widened.
Long term service planning should take into consideration the ~4% projected population increase of the female 50-70 demographic in NHS Borders, as projected by the National Records of Scotland.
SpatialData.gov.scot Metadata Portal: NHS Scotland Health Boards https://spatialdata.gov.scot/geonetwork/srv/api/records/f12c3826-4b4b-40e6-bf4f-77b9ed01dc14
Public Health Scotland: 5 Year Summary of Incidence by Health Board https://www.opendata.nhs.scot/dataset/annual-cancer-incidence/resource/e8d33b2b-1fb2-4d59-ad21-20fa2f76d9d5
Public Health Scotland: Geography Codes and Labels https://www.opendata.nhs.scot/dataset/geography-codes-and-labels
Public Health Scotland: Incidence by Health Board https://www.opendata.nhs.scot/dataset/annual-cancer-incidence/resource/3aef16b7-8af6-4ce0-a90b-8a29d6870014
NHS National Services Scotland, 2022: https://www.nss.nhs.scot/specialist-healthcare/screening-programmes/breast-screening/
National Records of Scotland, 2023: https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/population/population-projections/sub-national-population-projections/2018-based/detailed-datasets
Public Health Scotland, 2022: https://www.publichealthscotland.scot/media/12843/2022-04-26_breast_screening_report.pdf